METHOD AND APPARATUS FOR BLIND SEPARATION OF AN 
OVERCOMPLETE SET MIXED SIGNALS 



PRIORITY CLAIM 

This application claims the benefit of priority to provisional applications number 
60/246,498, filed in the United States on November 9, 2000, and titled "Over-complete 
bss by applying sparse decomposition and information theoretic based probabilistic 
approach". 

BACKGROUND 

(1) Technical Field 

The present invention relates to the field of processing signals comprised of a mixture of 
signals from a plurality of sources, and more particularly to processing signals comprised 
of a mixture of signals from a plurality of sources in the case where the number of signal 
sources exceeds that of the number of detecting sensors and where the number of signal 
sources, the individual signals from the signal sources, and the mixing matrix are 
unknown. 

(2) Description of the Art 

Blind mixed signal separation is a phrase that describes the separation of signals from a 
plurality of sources when the number of sources, the individual signals from the signal 
sources, and the mixing matrix are unknown. When the number of sources exceeds the 
number of sensors used for receiving a mixed signal from the sources, the case is known 
as the "over-complete" case. In theory, it is possible to perfectly separate mixed signals 
if the number of signal sources is equal to or less than the number of sensors. However, 
the separation of mixed signals in the over-complete case is difficult and perfect 
separation is not possible, even theoretically. 

The separation of mixed signals is an issue in many situations, two important examples of 

which include cellular communications, especially in urban environments, and in spoken 

dialogue information retrieval on mobile platforms. In cellular communications, the 
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interference signals correspond to the signals that get reflected from various scatterers 
(multipath) such as buildings and noise. On the other hand, in spoken dialogue-based 
systems on mobile platforms, the interference signals correspond to other speakers and 
noise. The signal that is received at a sensor is a mixed signal that includes interference 
5 signals as well as the desired signals, together discussed herein as source signals. In 
these cases, it is not practical to know a priori the number of interfering signals (which 
are considered as different signal sources), and hence, it is not practical to use the same 
number of sensors, e.g., antennas in the case of cellular communication and microphones 
in the case of spoken dialogue-based systems, as that of the signal sources. It is therefore 
10 imperative to develop a signal separation system that can handle the over-complete case 
for efficient and clear cellular communication and for robust spoken dialogue-based 
;j5f information retrieval on mobile platforms. This is important to provide clear 
Q communication in the case of cellular phones and to improve speech recognition in the 
jjjj case of spoken dialogue-based information retrieval systems. 

is*;; 

* As stated, since the number and nature of source signals change, it is not practical to 

y ; know them a priori. Therefore, it is not always practical to apply signal separation 

W techniques that work well when the number of source signals is equal to the number of 

ijj sensors. Further, in this case, since how the signals get mixed (e.g., the mixing matrix) is 

20 unknown, it is necessary to apply blind techniques for the separation of the source 
signals. 



The solution of the over-complete case is a relatively recent topic within the research 
community. A few techniques have been developed, as discussed in the references 

25 provided at the end of this Background section. These techniques generally suffer from 
several drawbacks. They suffer from limited signal separation efficiency. Further, they 
lack robustness for different types of mixing matrices and signals. Additionally, they are 
computationally sluggish, making real-time implementation difficult. Finally, their 
theoretical limitations are difficult to ascertain, making them difficult to apply due to 

30 uncertainty regarding their performance. 
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More specifically, in M. Zibulevsky and B. A. Pearlmutter, "Blind source separation by 
sparse decomposition ," University of New Mexico technical report No. CS99-1, 1999, 
the estimation of the mixing matrix and source signals takes place separately, which does 
not allow for efficient separation of the mixed signals, since the estimation of the mixing 
matrix effects the estimation of the source signals, resulting in complex and expensive 
computation. In L. Q. Zhang, S. Amari and A. Cichocki, "Natural gradient approach to 
blind separation of over and under complete mixtures," Proceedings of ICA'99, Aussois, 
France 1999, a natural gradient approach to blind source separation of over and under- 
complete mixtures is described from a theoretical point of view. The method described 
makes use of Lie group structures in the source signals and uses Reimann metrics. A 
learning algorithm based on the minimization of mutual information is described. In Te- 
Won Lee, M. S. Lewicki, M. Girolami and S. J. Sejnowski, "Blind source separation of 
more sources than mixtures using overcomplete representations," IEEE Signal processing 
letters, Vol. 6, No. 4, pp. 87-90, April 1999, another probabilistic approach is described. 
This technique estimates the mixing matrix and the source signals separately, hence 
having the same disadvantage as Zibulevsky et al., mentioned above. The mixing matrix 
is estimated by considering it as basis vectors and an approximated learning rule is 
applied. In this approximation, it is assumed there is no additive noise and that there 
exists temporal independence of the samples of the mixtures. The technique in Lee et al., 
is demonstrated only for use with clean mixed signals and a fixed mixing matrix. In H-C. 
Wu, J. C. Principe and D. Xu, "Exploring the time-frequency microstructure of speech 
for blind source separation,", ICASSP'98, pp. 1 145-1 148, an approach based on the 
concept of thinning and estimating the spatial directions of the mixing matrix is applied 
for blind source separation. This approach is not probabilistic-based, which limits its 
ability to separate source signals. Furthermore, the approach appears to be effective only 
when the number of sensors is equal to the number of sources. In P. Bofill and M. 
Zibulevsky, "Blind separation of more sources than mixtures using sparsity of their short- 
time fourier transform," Proc. Of ICA workshop, July 1999, pp. 87-92, a probabilistic 
approach is described in which the mixing matrix and the source signals are estimated 
separately, thus suffering from the same disadvantage as Zibulevsky et al., mentioned 
above. Finally, all of these references suffer from the drawback of an unknown 
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theoretical performance bound. Thus, a user operating a system based on one of these 
techniques cannot know the theoretical limitations of their system. 

It is desirable to provide a system for solving the over-complete case that overcomes 
these limitations and that estimates the mixing matrix jointly to provide an advantage of 
efficiently separating the mixed signals by taking into account the effect of estimating the 
mixing matrix on the estimate of the source signals, and which allows for efficient 
convergence on a solution. 

The following references are provided to assist the reader in gaining more knowledge 
regarding the state of the art in this technical area. 

References: 

1. M. Zibulevsky and B. A. Pearlmutter, "Blind source separation by sparse 
decomposition," University of New Mexico technical report No. CS99-1, 1999. 

2. L. Q. Zhang, S. Amari and A. Cichocki, "Natural gradient approach to blind 
separation of over and under complete mixtures," Proceedings of ICA'99, Aussois, 
France 1999. 

3. Te-Won Lee, M. S. Lewicki, M. Girolami and S. J. Sejnowski, "Blind source 
separation of more sources than mixtures using overcomplete representations," IEEE 
Signal processing letters, Vol. 6, No. 4, pp. 87-90, April 1999. 

4. H-C. Wu, J. C. Principe and D. Xu, "Exploring the time-frequency microstructure of 
speech for blind source separation", ICASSP'98, pp. 1145-1148. 

5. P. E. Gill, W. Murray and M. H. Wright, Practical optimization . Chapter 3, Academic 
Press, 1981. 
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6. P. Bofill and M. Zibulevsky, "Blind separation of more sources than mixtures using 
sparsity of their short-time fourier transform," Proc. Of ICA workshop, July 1999, pp. 
87-92. 

7. B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: A 
strategy employed by VI?," In Vision Research, vol. 37, pp. 331 1-3325, 1997. 

SUMMARY 

The present invention provides an apparatus, a method, and a computer program product 
for separating mixed signals into estimates of their source signal components. 
Specifically, the apparatus is designed to allow for the efficient separation of an 
overcomplete set of mixed signals. 

In one embodiment, the apparatus comprises a data processing a data processing system 
including an input for receiving mixed signals from a plurality of sensors, where the 
sensors are configured to receive mixed signal samples. The mixed signal samples 
comprise a mixture of signals transmitted from signal sources through an environment 
and noise. A signal processor is attached with the input for receiving the mixed signals 
from the sensors. A memory is attached with the signal processor for storing data during 
operations of the signal processor. The data processing system further comprises means 
for storing data representing the input from the sensors in a mixed signal matrix X; means 
for storing data representing the noise in a noise matrix V; means for storing data 
representing an estimate of the individual signals from the mixture of signals from the 
signal sources in a source signal estimate matrix S ; means for storing data representing 
an estimate of the effects of the environment in a estimated mixing matrix A where the 
matrices are related by X = AS + V ; means for generating an initial estimate of the 

estimated mixing matrix A ; means for determining the number of signal sources and 
associated lines of correlation of each of the signal sources from the estimated mixing 

matrix A , and for representing the signal sources in the source signal estimate matrix S ; 

means for jointly optimizing the source signal estimate matrix S and the estimated 
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mixing matrix A in an iterative manner, to generate an optimized source signal estimate 
matrix S and a final estimated mixing matrix A ; and means for restoring the separated 
source signals from the optimized source signal estimate matrix S , whereby a plurality of 
mixed signals from unknown sources traveling through an environment with added noise 
may be separated so that the original, separate signals may be reconstructed. 

In a further embodiment, the means for generating an initial estimate of the estimated 
mixing matrix A comprises means for transforming the mixed signal matrix X into the 
sparse domain using a transform operator; means for determining a frequency band 
within the sparse domain that contains the most information that can be used to determine 
lines of correlation to determine the number of signal sources; means for determining a 
measure and an optimal threshold for the measure for the determination of noise within 
the frequency band; means for recalculating the measure used in the determination of the 
noise within the frequency band using the optimal threshold; and means for determining 
the local maxima of a distribution of the measure, where the local maxima represent 
angles which are inserted into the estimated mixing matrix A to provide an initial 
estimate of the estimated mixing matrix A . 

In a still further embodiment, the means for jointly optimizing the source signal estimate 
matrix S and the estimated mixing matrix A in an iterative manner, to generate an 

optimized source signal estimate matrix S and a final estimated mixing matrix A 
comprises means for clustering the mixed signal samples using a geometric constraint; 
and means for evaluating a convergence criteria based on the clustered mixed signal 
samples to determine whether the convergence criteria are met, and if the convergence 
criteria are not met, iteratively adjusting the clustering of the mixed signal samples and 
parameters of the geometric constraint to create a new set of clusters until the 
convergence criteria are met, to provide a final estimated mixing matrix A . 
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In a still further embodiment, the means for jointly optimizing the source signal estimate 

A A 

matrix S and the estimated mixing matrix A in an iterative manner, to generate an 

A A 

optimized source signal estimate matrix S and a final estimated mixing matrix A further 
comprises means for obtaining a multi-band sparse domain estimate of the source signal 

A A A 

5 estimate matrix S using the relationship X = AS + V ; and means for using the adjusted 
geometric constraint corresponding to the final estimated mixing matrix A in each of the 
bands of the sparse domain for the source signal estimate matrix S and determining 

A 

whether a convergence criteria is met for the source signal estimate matrix S , and if the 
convergence criteria are not met, iteratively adjusting the clustering of the mixed signal 
10 samples to create a new set of clusters until the convergence criteria are met, to provide a 

final source signal estimate matrix S . 

In accordance with the present invention, it is preferable that the means for transforming 
the mixed signal matrix X into the sparse domain using a transform operator is a Fourier 

A 

1 5 transform operator such that the estimated mixing matrix A is represented in the Fourier 
transform, and wherein the means for obtaining a multi-band sparse domain estimate of 

A A A 

the source signal estimate matrix S using the relationship X = AS + V uses a wavelet 



O transform operator to obtain the multi-band sparse domain estimate. 

I* 

20 It is also preferable that the means for generating an initial estimate of the estimated 

A 

mixing matrix A comprises means for transforming the mixed signal matrix X into the 
frequency domain using a Fourier operator; means for using a mutual information 
criterion to determine a frequency band within the sparse domain that contains the most 
information that can be used to determine lines of correlation to determine the number of 

25 signal sources; means for determining a random variable ang = arctan x fo an< ^ ? w here 

Xj(band) 

xi(band) and xj(band) represent Fourier values of mixture in the selected frequency band, 
and an optimal threshold ANG for ang, where the optimal threshold ANG is determined 
by computing the entropy E(ang, ANG) vs. ANG and searching for the optimal value of 
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ANG corresponding to the minimum rate of descent of the entropy E(ang, ANG); means 
for recalculating ang based on the optimal threshold ANG; means for using a standard 
peak detection technique to determine the number and values of local maxima of a 
histogram of ang where the local maxima represent angles which are inserted into the 
5 estimated mixing matrix A to provide an initial estimate of the estimated mixing matrix 
A. 



m 



It is also preferable that the means for jointly optimizing the source signal estimate matrix 

A A 

S and the estimated mixing matrix A in an iterative manner, to generate an optimized 

10 source signal estimate matrix S and a final estimated mixing matrix A comprises means 
for clustering the mixed signal samples in the Fourier domain along the lines of 
correlation with one cluster per source using a straight distance metric geometric 
constraint, with the clusters representing estimates of the Fourier domain representation 

of S , F(S) , where F represents a Fourier domain operator ; and means for evaluating a 
15 convergence criteria, min Xc T F(S) , with the convergence criteria, min )x T F(S) , 

developed from the log likelihood function L(F(S) | F(X), A) with the assumption of 
Laplanicity of source signals in the Fourier domain following the probability 

P(F(S)) = —e 1 1 , where c T = [1, 1, „.l] is a unit vector, with the convergence criteria, 
2 

. T A 

mm F(S) , evaluated based on the clustered mixed signal samples to determine 

20 whether the convergence criteria, min Xc T F(S) , is met, and if the convergence criteria, 

min x* t |f(S)|, is not met, iteratively adjusting the clustering of the mixed signal 
samples and parameters of the geometric constraint to create a new set of clusters until 
the convergence criteria, min hc r F(S)j , is met, to provide a final estimated mixing 

matrix A. 



25 
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It is additionally preferable that the means for jointly optimizing the source signal 
estimate matrix S and the estimated mixing matrix A in an iterative manner, to generate 
an optimized source signal estimate matrix S and a final estimated mixing matrix A 
further comprises means for obtaining a multi-band sparse domain estimate of the source 
5 signal estimate matrix S using the relationship X = AS + V , applied in the Wavelet 

domain; and means for using the adjusted geometric constraint corresponding to the final 
estimated mixing matrix A in each of the bands of the Wavelet domain for the source 
signal estimate matrix S , W(S) , and determining whether a convergence criteria, min 

Xc T W(S) is met for the source signal estimate matrix S , where the convergence 
10 criteria, min te T |w(S) , is developed from the log likelihood function 

L(W(S) | W(X), A) with the assumption of Laplanicity of source signals in the Wavelet 
domain following the probability P(W(S)) = — e~ U ' W(S) ' , where c T = [1, 1, ...1] is a unit 



: ;:s5a. 



2 

vector, and if the convergence criteria is not met, min te T |w(S)| , iteratively adjusting 
the clustering of the mixed signal samples to create a new set of clusters until the 
:X 15 convergence criteria, min )x T W(S) , is met, to provide a final source signal estimate 

matrix S. 



Two major applications of the present invention are separating acoustic signals for 
applications such as voice recognition and separating mixed radio frequency signals for 
20 applications such as separating cellular phone signals. 



The present invention may also be embodied as a method, in which the "means" 
discussed above are interpreted as steps operated on a data processing system as 
previously described. 

25 
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Additionally, the present invention may be embodied as a computer program product, in 
which the "means" discussed above are recorded on a computer readable medium such as 
an optical storage device (e.g., a CD or DVD). 



■'r.sis : 

:: :: 

w 



5 The "means" of the present invention are generally in the form of program logic that may 
be in the form of computer program code or may be embedded in hardware depending on 
the needs of a particular embodiment. 

BRIEF DESCRIPTION OF THE DRAWINGS 
10 These and other features, aspects, and advantages of the present invention will become 
better understood with regard to the following description, appended claims, and 
accompanying drawings where: 

FIG. 1 is an illustrative diagram depicting an array of sensors, an environment, and a 
plurality of signal sources in order to depict the general operating environment of the 
1 5 present invention; 

FIG. 2 is a flow diagram depicting the general operation of the present invention; 
FIG. 3 is a block diagram depicting the components of a processing system used in the 
present invention; 

FIG. 4 is an illustrative diagram of a computer program product embodiment of the 
20 present invention; 

FIG. 5 is a more specific flow diagram depicting the operation of the present invention, 
illustrating a preferred embodiment for generating the initial estimate of the estimated 

mixing matrix A; 

FIG. 6 is a composite diagram depicting the results of operations corresponding to the 
25 flow diagrams, where: 

FIG. 6(a) is an example scatter plot of original mixed data; 

FIG. 6(b) is a bar graph depicting mutual information in each of several frequency sub- 
bands; 

FIG. 6(c) is a scatter plot of spectral values of a chosen sub-band; 
30 FIG. 6(d) is an angle histogram of the chosen sub-band; 
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FIG. 6(e) is a rate of change of entropy plot depicting threshold selection at a point of 
minimum rate of change of entropy; 

FIG. 6(f) is an angle histogram of the chosen sub-band after thresholding; 
FIG. 7 is a more specific flow diagram depicting the operation of the present invention, 
5 illustrating a preferred embodiment for jointly and iteratively optimizing the estimated 

mixing matrix A and the source signal estimate matrix S ; 

FIG. 8 is a graph depicting the application of the Crame-Rao (CR) bound apparatus of the 
present invention to develop CR bound curves of estimated source signals for different 

sizes of an estimated mixing matrix A ; 
10 FIG. 9 is a graph illustrating the application of the CR bound apparatus of the present 

invention applied to an example estimated mixing matrix A , depicting the experimental 
bound when the number of signal samples is 1600; 

FIG. 10 is a series of graphs of amplitude versus time depicting the separation of signals 
from a mixture containing three speech signals mixed with two noise signals, with the 
15 aggregate mixed signal received at two sensors across an environment, where: 

FIG. 10(a) depicts the original mixed signal including the three English speech signals 
mixed with the two noise signals; 

FIG. 10(b)(i) depicts a first original (unmixed) English speech signal; 
FIG. 10(b)(ii) depicts the first English speech signal after separation, for comparison with 
20 the original depicted in FIG. 1 0(b)(i); 

FIG. 10(c)(i) depicts a second original (unmixed) English speech signal; 

FIG. 10(c)(ii) depicts the second English speech signal after separation, for comparison 

with the original depicted in FIG. 10(c)(i); 

FIG. 10(d)(i) depicts a third original (unmixed) English speech signal; 
25 FIG. 10(d)(ii) depicts the third English speech signal after separation, for comparison 
with the original depicted in FIG. 10(d)(i); 

FIG. 1 1 is a series of graphs of amplitude versus time depicting the separation of signals 
from a mixture containing two mixed signals with three speech signals mixed with a 
Gaussian random noise signal, with the aggregate mixed signal received at two sensors 
30 across an environment, where: 
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FIG. 1 1(a) depicts the original two mixed signals including the three mixed language 
speech signals mixed with the two noise signals; 

FIG. 1 l(b)(i) depicts a first original (unmixed) mixed language speech signal; 
FIG. 1 l(b)(ii) depicts the first mixed language speech signal after separation, for 
5 comparison with the original depicted in FIG. 1 l(b)(i); 

FIG. ll(c)(i) depicts a second original (unmixed) mixed language speech signal; 
FIG. 1 l(c)(ii) depicts the second mixed language speech signal after separation, for 
comparison with the original depicted in FIG. 1 l(c)(i); 

FIG. 1 l(d)(i) depicts a third original (unmixed) mixed language speech signal; and 
1 0 FIG. 1 l(d)(ii) depicts the third mixed language speech signal after separation, for 
comparison with the original depicted in FIG. 1 l(d)(i). 

3 DETAILED DESCRIPTION 

^ The present invention relates to the field of processing signals comprised of a mixture of 

HI 1 5 signals from a plurality of sources, and more particularly to processing signals comprised 
of a mixture of signals from a plurality of sources in the case where the number of signal 
jh* sources exceeds that of the number of detecting sensors and where the number of signal 
Q sources, the individual signals from the signal sources, and the mixing matrix are 

2SJ unknown. The following description is presented to enable one of ordinary skill in the art 

!M ! 20 to make and use the invention and to incorporate it in the context of particular 

applications. Various modifications, as well as a variety of uses in different applications 
will be readily apparent to those skilled in the art, and the general principles defined 
herein may be applied to a wide range of embodiments. Thus, the present invention is 
not intended to be limited to the embodiments presented, but is to be accorded the widest 
25 scope consistent with the principles and novel features disclosed herein. 

In order to provide a working frame of reference, first a glossary of terms used in the 
description and claims is given as a central resource for the reader. Then, a brief 
introduction is provided in the form of a narrative description of the present invention to 
30 give a conceptual understanding prior to developing the specific details. 
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(1) Glossary 

Before describing the specific details of the present invention, it is useful to provide a 
centralized location in which various terms used herein and in the claims. The terms 
defined are as follows: 

Environment - The term "environment" refers generally to the physical environment 
across which signals are transmitted and mixed prior to sampling by the sensors. An 
environment typically creates problems with clear transmission such as multipath, 
distortion, and attenuation. 

Means - The term "means" as used in this invention generally indicates a set of 
operations to be performed on a computer. Non-limiting examples of "means" include 
computer program code (source or object code) and "hard-coded" electronics. The 
"means" may be stored in the memory of a computer or on a computer readable medium. 

Noise - The term "noise" generally includes any unwanted signals received at the 
sensors. For example, background music or murmur in the case of separating speech 
signals. 

Sensor - The term "sensor" refers to a mechanism for sampling data from the 
environment for delivery to a processor. The present invention is intended to be 
compatible with a wide range of sensors, two important broad classes of which include 
audio sensors (microphones) and radio-frequency sensors (antennae). 

(2) Introduction 
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The present invention is developed to solve the problem depicted in FIG. 1, wherein an 
illustrative diagram is presented, depicting an array of sensors x M (t) 100 and a plurality 
of signal sources s N (t) 102, where M and JV represent the number of sensors x M (i) 100 
and signal sources s N (i) 102, respectively, and where t represents time. The array of 
sensors x M (t) 100 is configured to sample the signals from the signal sources s N 

(0 102, 

where the signals are mixed and include noise and the effects of the environment 104. 
The signals may be combined instantaneously or convolutively, or both. How the signals 
are combined depends on an a priori unknown environment through which the signals 
transmit. 

It is provided that X is an M x T matrix corresponding to the output of M sensors at times 

t = 1 ,2, , T. In other words, the row / of X corresponds to the 7 th mixture signal It 

is further provided that S is an Nx T matrix of underlying source signals and that A is an 
unknown Mx Admixing matrix corresponding to the environment effect in mixing the 
signals that is observed at M sensors. Additionally, it is provided that V is an M x T noise 
matrix representing other noise from the environment. The observed signal X can then be 
represented mathematically in the matrix form as X = AS + V . The problem of blind 
source separation (BSS) is to recover the original source signals S from the observations 
of X (e.g., the samples of the signal at each sensor x M (i) 100) without the prior 
knowledge of source signals and the mixing matrix A, The blind source separation 
techniques that handle the cases (a) M-N, (b)N<M and (c)N>M are referred as 
regular, under and over-complete blind source separation, respectively. A principal focus 
of the present invention is the over-complete blind source separation problem since it is 
encountered in many practical applications. Although the present invention is equally 
applicable to regular and under-complete blind source separation, the discussion herein is 
focused on the over-complete blind source separation case. In the regular and under- 
complete cases, it is theoretically possible to separate the source signals from the mixed 
observed signals perfectly; however, it is not possible to separate signals perfectly in the 
over-complete case even theoretically. Due to additive noise and rectangular nature of 
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the mixing matrix A, the solution of separating source signals from the mixed signals 
cannot be solved by finding the pseudo-inverse S = A + X . A more efficient technique is a 
probabilistic approach, which is the approach used by this invention. 

The separation of signals blindly using probabilistic approaches mainly corresponds to 
minimizing the log of a posterior likelihood function p(s|X, A)with respect to S. This 
likelihood function (conditional probability function) can be written in the form 

p(s|x, a) oc p(x|a, s)p(a, s) = p(x|a, s)p(a)p(s) 

by applying the Baye's theorem and assuming statistical independence between A and S. 
Here, P(A) and P(S) correspond to prior probabilities of A and 5, respectively. The result 

of applying the log operation to p(s|X,a) is z(s|X,a)=l(x|A,s)+Z(a)+Z(s) where 
L corresponds to log{PQ). The minimization of log likelihood function of p(s|X, a) then 
corresponds to minimizing z(x|A, s)+ L(s) with respect to S since there is no prior 

information on A. Since the separation of S depends on A, by jointly optimizing the 
above log likelihood function with respect to both A and S the source signals may be 
separated from the observations more efficiently. 

A flow diagram depicting the general operation of the present invention is shown in FIG. 
2. The present invention generally comprises several operations, which are typically 
performed as software operating on a data processing system. Although discussed in the 
form of "means plus function" so as to encompass software and hardware versions as 
well as a variety of mathematical and coding techniques, this discussion of "means" 
applies equally to steps in the method of the present invention. 

First, in order to separate the source signals from a mixed signal, the invention 

incorporates, within the data processing system, a means for storing data representing the 

input from the sensors in a mixed signal matrix X 200. The actual sensor input may be 

stored in the mixed signal matrix X 200. A noise matrix Vis also allocated by a means 
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for storing data representing the noise in a noise matrix F202. The invention further 
incorporates a means for storing an estimate of the individual signals from the mixture of 

signals from the signal sources in a source signal estimate matrix S 204. In addition, the 
invention also incorporates a means for storing data representing an estimate of the 

effects of the environment in an estimated mixing matrix A 206. The matrices are 

related by X = AS + V . The means 200, 202, 204, and 206 may be operated in any 
sequence, or may be operated in parallel - their ordering in FIG. 2 was selected 
arbitrarily. The means 200, 202, 204, and 206 are typically in the form of a data structure 
within memory. 

After allocation of the storage area for, and the population of, the mixed signal matrix X 
with signal (data) samples, a means for generating an initial estimate of the estimated 

A 

mixing matrix A 208 is operated. Subsequently, with the results, a means for 
determining the number of signal sources and associated lines of correlation of each of 

A 

the signal sources from the estimated mixing matrix A , and for representing the signal 

sources in the source signal estimate matrix S 210 is operated in order to prepare for the 
joint optimization procedure. Next, a means for jointly optimizing the source signal 

A. /V 

estimate matrix S and the estimated mixing matrix A in an iterative manner 212 is 
operated in order to generate an optimized source signal estimate matrix S and a final 
estimated mixing matrix A . Once the optimized source signal estimate matrix S and the 
final estimated mixing matrix A have been obtained, the signals are restored by a means 
for restoring the separated source signals from the optimized source signal estimate 

matrix S 214. As a result, a plurality of mixed signals from unknown sources traveling 
through an environment with added noise may be separated so that the original, separate 
signals may be reconstructed. 

Details of the various physical embodiments of the present invention are discussed 

A 

below. Then, the means for generating an initial estimated mixing matrix A 208 and for 

A 

jointly optimizing the initial estimated mixing matrix A and a source signal estimate 
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matrix S 212, and for restoring the signals 214 are presented in more detail Finally, an 
apparatus and technique are presented for determining a theoretical Crame-Rao (CR) 
bound for the initial estimated mixing matrix A and a source signal estimate matrix S is 
provided to allow for an understanding of the operational limits of an embodiment of the 
5 present invention. 

(3) Physical Embodiments of the Present Invention 

The present invention has three principal physical embodiments. The first is an apparatus 
for blind signal separation in the form of a data processing system, typically operating 
software to separate signals received from a plurality of sensors from a plurality of 
sources. The second physical embodiment is a method, typically software, operated on a 
data processing system. The third principal physical embodiment is a computer program 
product. The computer program product generally represents computer readable code 
stored on a computer readable medium such as an optical storage device, e.g., a compact 
disc (CD) or digital versatile disc (DVD), or a magnetic storage device such as a floppy 
disk or magnetic tape. Other, non-limiting examples of computer readable media include 
hard disks and flash-type memories. These embodiments will be described in more detail 
below. 

20 A flow diagram depicting the components of a processing system used in the present 

invention is provided in FIG. 3. The data processing system 300 comprises an input 302 
for receiving data from a plurality of sensors and an output 304 for providing output to a 
user. The output provided to the user is typically in the form of estimated separated 
source signals, or intermediate steps along the way toward generating the estimated 

25 separated source signals. On the other hand, the output could also comprise the 

theoretical bound for a blind source separation system. The input 302 and the output 304 
are both coupled with a signal processor 306, which may be a general-purpose computer 
processor or a specialized processor designed specifically for use with the present 
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invention. The signal processor 306 is coupled with a memory 308 to permit storage of 
data and software to be manipulated by commands to the processor. 



An illustrative diagram of a computer program product embodying the present invention 
5 is depicted in FIG. 4. The computer program product 400 is depicted as an optical disk 
such as a CD or DVD. However, as mentioned previously, the computer program 
product generally represents computer readable code stored on any compatible computer 
readable media. 



iu 10 (4) Generating the Initial Estimated Mixing Matrix A 



o 



Before describing the joint optimization apparatus and technique, a description of the 

generation of the initial estimated mixing matrix A is provided. This description also 
serves to help clarify the concepts behind the joint optimization. 

15 When mixed signals are sparse, the distribution of data in the mixture space forms a set 
of elongated clusters along the directions of the columns of the mixing matrix A. 
Therefore, a goal in obtaining an initial estimate of the mixing matrix A is to determine 
lines or directions (angles) of maximum data density. Sparsity refers to only a small 
number of coefficients in the mixing matrix A in the transformed domain differing 

20 significantly from zero. 



This description of the technique for generating an initial estimate of the mixing matrix A 
is set in the context of an example with three independent speech sources, two sensors, 
and two noise sources. The observed mixed signal matrix X is a 2xT matrix, where each 
25 row of X represents one of the observed signals (from the point of view of the sensors), xi 
and X2. The mixing matrix A 9 in this case, is a 2x3 matrix. The source signal matrix S is 
a 3xT matrix, with each row representing one of the three signal sources. The noise 
matrix Fis a 2xT matrix, with each row representing one of the two noise sources. 
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A more specific flow diagram depicting the operation of the present invention, 
illustrating a preferred embodiment for generating the initial estimate of the estimated 
mixing matrix A , is depicted in FIG. 5. To aid in the understanding of the steps involved 
5 in generating the initial estimate of the estimated mixing matrix A , a composite diagram 
depicting the results of operations corresponding to the flow diagrams, is presented in 
FIG. 6. More specifically, an example scatter plot of original mixed data is presented in 
FIG. 6(a); a bar graph depicting mutual information in each of several frequency sub- 
bands is presented in FIG. 6(b); a scatter plot of spectral values of a chosen sub-band is 
10 presented in FIG. 6(c); an angle histogram of the chosen sub-band is depicted in FIG. 
|j 6(d); an rate of change of entropy plot depicting threshold selection at a point of 

minimum rate of change of entropy is depicted in FIG. 6(e); and an angle histogram of 
the chosen sub-band after thresholding is depicted in FIG. 6(f). 



^ 1 5 As shown in FIG. 5, the means for generating an initial estimate of the estimated mixing 
matrix A 208 (as depicted in FIG. 2) is illustrated in with greater specificity. As was the 
case during the discussion of FIG. 2, the "means" described here are typically in the form 
p of computer program code operating on the mixed signal data in a series of steps. The 

parts of the means 208 may be operated, for example, as specific subroutines in a larger 
20 program. 



The means 208 comprises a means for transforming the mixed signal matrix X into the 
sparse domain using a transform operator 500. In a preferred embodiment, the means 
500 is a Fourier transform operator (e.g., a spectrogram) for transforming the mixed 
25 signal matrix X into the frequency domain. To illustrate the benefits of transforming the 
mixed signal matrix X into the sparse domain, an example scatter plot of original mixed 
data is presented in FIG. 6(a). The scatter plot is obtained by plotting the time domain 
values of one mixture x 2 versus the time domain values of the other mixture xi. From 
this plot, it can be seen that the values form one cloud from which it is impossible to 
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distinguish different source signals. Since examination of the time domain values 
provides no insight, it is better to examine the mixtures xi and x 2 in the frequency 
domain, where the data may be represented sparsely. The motivation for examining the 
received mixtures in the transform domain is that the application of either the Fourier 
5 domain (spectrogram) or the wavelet transform will result in a representation of the 
mixtures that is sparse and that yields information about the underlying source signals 
and how they are mixed together by the mixing matrix. The choice of the Fourier domain 

for the estimate mixing matrix A is preferred because finer resolution can be obtained, 
and hence, a more accurate estimation of the mixing matrix A can be generated. The 
10 observed signals are transformed to the Fourier domain by applying the spectrogram. In 
m the examples provided herein, including the depictions in FIG. 6 and detailed further 

2! below, the spectrogram was computed using a Hamming window of length 16 samples. 

55 The window was shifted by 8 samples. The spectrogram transforms the observed mixture 

M 

jy signals into frequency bands. The number of frequency bands is equal to half the length 

;|^: 15 of the FFT+1 (Fast Fourier Transform + 1) if the signals are real. Hence, in the example 
* shown herein, there are 9 frequency bands. 

Next, a means for determining a frequency band within the sparse domain that contains 
ih*> the most information that can be used to determine lines of correlation to determine the 

20 number of signal sources 502 is operated. A preferred mechanism for this determination 
is a mutual information criterion. This means 502 is applied to determine which 
frequency band provides the most information about the lines of correlation of the source 
signals from the spectrogram values of the mixtures xi and x 2 . This frequency band is 
determined by applying the maximum mutual information criterion. Mutual information 

25 is a measure of uncertainty remaining about an observation (event) Z, for example, after 

observing another observation Y. It is defined as /(Z, Y) - H(Z) - H (z\y) where H(Z) is 

K 

the entropy of Z, and is given by H(Z) = -^p(z k )log(p(z k ) and #(Z|7) is the 

k=\ 

conditional entropy, given by #(z|y) = H(Z, Y) - H(Y) . By applying the mutual 
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information criterion, a frequency sub-band may be found in which uncertainty about one 
observed mixture is maximally resolved by observing another mixture* In essence, by 
applying the maximum mutual information criterion a frequency sub-band is being found 
in which the directions of spread of observed mixtures are as well resolved as possible. 
5 Note that in the example just discussed, Z is the group of spectrogram values in a 

particular frequency band of X2, and Y is the group of spectrogram values of Xi from the 
same frequency band. By applying the mutual information criterion, a frequency band 
may be found in which uncertainty about one observed mixture is maximally resolved by 
observing another mixture. A bar graph depicting mutual information in each of several 
10 frequency sub-bands is presented in FIG. 6(b). The figure plots bars representing the 
;u mutual information versus the 9 frequency bands used for this example. From the graph, 
j S it is clear that frequency band #3 (the third bar from the left) has the highest mutual 

Q information. It can be seen by generating a scatter plot of the spectrogram values of x 2 
i; l vs. xi from frequency band #3, depicted in FIG. 6(c), the scatter plot shows some 

W 15 structure, and contains what appear to be three lines or directions of correlation, one for 



each source (highlighted by black lines in the figure). By applying the mutual 
information criterion, a frequency band in which the directions of correlations are as 
resolved as possible is found. 



20 To further improve the resolution of these directions in the chosen frequency band, a 
thresholding technique is applied. Thus, a means for determining a measure and an 
optimal threshold for the measure for the determination of noise within the frequency 
band 504 is operated. A means for recalculating the measure used in the determination of 
the noise within the frequency band using the optimal threshold 506 is then operated. 

25 Because of the close relationship between means 504 and 506, they are described 
together here. The preferred form of the measure is a random variable 

ang = arctan x $ an d) ^ Xi(band) and xj(band) represent Fourier values of mixture 
Xj(band) 

in the selected frequency band. The optimal threshold in this case may be represented by 
ANG, which is determined by computing the entropy E(ang 9 ANG) vs. ANG and 
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searching for the optimal value of ANG corresponding to the minimum rate of descent of 
the entropy E(ang 9 ANG). 

After the mixtures are observed at the two sensors and transformed into the sparse 
5 domain, the random variable ang may be calculated. If the mixtures xi and X2 were made 
from spatially white additive noise (e.g., a mixing matrix of 0), then the variable ang 
would have values that are uniformly distributed (indicating no "lines" of correlation). 
On the contrary, if the signal sources si, S2, and S3 are spatially localized sources with 
narrow variances in the selected frequency band, then the resulting distribution of the 
10 variable ang would be described by a multi-modal distribution with three peaks in the 
distribution corresponding to the directions of correlations introduced by the mixing 
matrix. Since both noise and localized sources are present in the example used herein, the 

%<& 

O effect of the noise is to potentially mask or hide the correlation directions and thereby 

Q 

y prohibit determination of all of the correlation directions. The masking effect may be 

Wjj 15 seen in FIG. 6(d), which is an angle histogram of the chosen sub-band in terms of the 
iry variable ang. The three vertical lines 600 correspond to the three directions depicted in 

the scatter plot of FIG. 6(c). Although it is possible to resolve the correlation directions 

if* 

If of the source signals from the scatter plot of FIG. 6(c), it is not possible to resolve all of 

h Q them from the histogram of the variable ang. If the spectral values due to noise that are 

2*f 20 in the center of FIG. 6(c), marked by a circle 602, could be removed by thresholding, 
then it would be possible to resolve all of the correlation directions. Therefore, it is 
desirable to apply thresholding to allow for resolution of all of the correlation directions. 
An angle histogram of the chosen sub-band is depicted in FIG. 6(d), from which it is 
clear that the thresholding operation applied to the observed mixtures should reduce the 
25 masking of the maxima of a histogram of ang. 

The next issue is how to choose a value of the threshold. The approach used is based on 

measuring the entropy of the random variable ang as a function of a threshold ANG 

applied to the spectral values of the mixtures in the selected frequency band. Masking 

30 uniform distribution (the presence of noise) tends to increase the entropy. By increasing 

the threshold value, noise may be reduced and entropy decreased, eventually to a point 
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where signal sources begin to be removed as well. A function, E(ang, ANG) is defined, 
equal to the entropy of ang obtained by thresholding the spectral values of the mixtures 
xi and x 2 with the threshold value set to ANG. The value of ANG is increased until the 
rate of descent of function E(ang, ANG) is minimum within a chosen set of values. The 
5 value ANG that minimized E(ang 9 ANG) is then selected as the threshold value to 
threshold the observed mixtures in the sparse Fourier domain. ANG, in this example, 
corresponds to a range of values between [0. 1 * Axb 2 * Axb] where Axb = cov(spectal 
values in the chosen frequency sub-band). To compute E(ang 9 ANG) , all spectral 
coefficients of the spectral values of the mixtures xi and x 2 in the selected frequency band 
10 whose magnitudes are less than or equal to ANG are set to 0. Next, ang is calculated, 
iy : and a histogram is generated for the values of ang. Then for each set threshold value, 
y E(ang, ANG) is computed using the entropy definition mentioned above along with the 
O histogram values of ang. An Entropy plot depicting threshold selection at a point of 
I J minimum entropy is depicted in FIG. 6(e). The threshold value corresponding to the 
$V 15 minimum rate of descent of the entropy function E(ang, ANG) is selected automatically, 

Mi and is represented in the FIG. 6(e) by the minimum 604. An angle histogram of the 

. . ~, 

j; . chosen sub-band after thresholding is depicted in FIG. 6(f). From the histogram in FIG. 

Q 6(f), three local maxima are clearly visible, which correspond to three lines of correlation 

i! ji due to the mixing of the three source signals. These local maxima (both the number of 

^ 20 local maxima and their location) are automatically detected and the resulting angles 6 

are used to create the initial estimate of of the estimated mixing matrix A , where 

A. w ,=[cos(^);sin(^)]. 



Finally, means for determining the local maxima of a distribution of the measure 508 is 
25 operated, where the local maxima represent angles which are inserted into the estimated 

A A 

mixing matrix A to provide an initial estimate of the estimated mixing matrix A . For 
this purpose, a standard peak detection technique may be used to determine the number 
and values of local maxima of a histogram of ang where the local maxima represent 

angles which are inserted into the estimated mixing matrix A to provide an initial 

30 estimate of the estimated mixing matrix A . 
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Next, a discussion of the joint optimization technique of the present invention is 
provided. 

As. 

(5) Jointly Optimizing the Initial Estimated Mixing Matrix A and the Source Signal 
5 Estimate Matrix S , and Restoring the Signals 

A more specific flow diagram depicting the operation of a preferred embodiment for 
jointly and iteratively optimizing the estimated mixing matrix A and the source signal 
estimate matrix S is depicted in FIG. 7. The flow chart depicted in FIG. 7 corresponds to 
10 the means for jointly optimizing the source signal estimate matrix S and the estimated 

mixing matrix A in an iterative manner 212 depicted in FIG. 2. As previously 
mentioned in regard to the generation of the initial estimate of the estimated mixing 
matrix A , the "means" discussed herein are typically in the form of computer program 
code operating in a data processing system which operates generally in a series of steps. 

!M* 15 

Q First, in the optimization of the estimated mixing matrix A a means for clustering the 

!5J mixed signal samples is performed using a geometric constraint 700 is operated, 

iM ! preferably in the Fourier domain, along with the lines of correlation. This is preferably 

accomplished using one cluster per source along with a straight distance metric geometric 
20 constraint, with the clusters representing estimates of the Fourier domain representation 

of S , F(S) , where F represents a Fourier domain operator. 

Next, a means for evaluating a convergence criteria based on the clustered mixed signal 
samples 702 is used to determine whether the convergence criteria are met, and if the 
25 convergence criteria are not met 704, iteratively adjusting the clustering of the mixed 
signal samples and parameters of the geometric constraint 706 to create a new set of 
clusters until the convergence criteria are met 704, to provide a final estimated mixing 

matrix A . The convergence criteria is preferably in the form, min )x T F(S)| , and is 
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developed from the log likelihood function L(F(S) | F(X), A) with the assumption of 
Laplanicity of source signals in the Fourier domain following the probability 

P(F(S)) = — e 1 1 , where c T = [1, 1, ...1] is a unit vector. This convergence criteria, 

2 

min ^c t |f(S) , is evaluated based on the clustered mixed signal samples to determine 

5 whether it is met, and it is not, the clustering of the mixed signal samples is iteratively 
adjusted along with parameters of the geometric constraint to create a new set of clusters 

until convergence occurs. The result is a final estimated mixing matrix A . 

a 

Next, in the optimization of the estimate of the source signal estimate matrix S a means 

A 

10 for obtaining a multi-band sparse domain estimate of the source signal estimate matrix S 
using the relationship X = AS + V 708 is operated, preferably in the wavelet domain. 

Next, a means for using the adjusted geometric constraint corresponding to the final 
estimated mixing matrix A in each of the bands of the sparse domain for the source 

15 signal estimate matrix S 710 is used, and a determination is made regarding whether a 

a 

convergence criteria is met 712 for the source signal estimate matrix S , and if the 
convergence criteria are not met 712, iteratively adjusting the clustering of the mixed 
signal samples to create a new set of clusters 714 until the convergence criteria are met, 

A 

to provide a final source signal estimate matrix S . The adjusted geometric constraint 

A 

20 corresponding to the final estimated mixing matrix A is used in each of the bands of the 

A A 

Wavelet domain for the source signal estimate matrix S , W(S) . The convergence 
criteria is preferably in the form min Xc T W(S) . The convergence criteria, min 



W(S) , is developed from the log likelihood function L(W(S) | W(X), A) with the 



assumption of Laplanicity of source signals in the Wavelet domain following the 
25 probability P(W(S)) = ^ e ^ T l W(S) l t w h ere c T = [1, 1, ... 1] is a unit vector. If the 

convergence criteria is not met, the clustering of the mixed signal samples is iteratively 
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adjusted to create a new set of clusters until the convergence criteria, X,c T W(S) 



, is met, 



providing a final source signal estimate matrix S . Greater detail with regard to the joint 
optimization is provided below. 



5 For the joint optimization problem, it is assumed that the source signals are statistically 
independent of each other and follow a Laplacian probability distribution function in the 
sparse Fourier and wavelet domains. It has been generally observed that the Fourier and 
wavelet coefficients of signals do exhibit Laplacian behavior. It is also assumed that 
noise V is Gaussian (i.e., normally distributed). These are not strong assumptions since 
10 in practice source signals are generally statistically independent of each other and are 
^ normally distributed. 

?t In order to reduce the complexity of the problem of separation of mixed signals, the 

|W sparse domain is used by applying the wavelet/short-time Fourier transform. This has 

■; ys 

|ll j 1 5 another advantage of reducing the noise effect - by thresholding the wavelet/Fourier 

coefficients we can achieve de-noising. The probabilistic approach of blind source 
!h* separation discussed herein is then applied. The observed mixed signals in the 

1 J transformed domain can be written as: W(x) = AW(s) + W(y) where W is either the 

2 Fourier or wavelet transform. This has the same form as the mixed observed signals in 
20 the time domain. Therefore, without loss of generality, the problem of blind source 

separation in the signal domain and in the transformed sparse domain can be considered 
equivalent. Therefore, the general probabilistic approach mentioned in the previous 
section applies in the transformed sparse domain. To get the separated source signals 
back from the transformed domain to the time domain, the inverse wavelet or Fourier 
25 transform is applied. 

For the development of theory of joint optimization, the log likelihood function 
l(fT(s)|^(X), a) in the sparse domain is used. This theory is described briefly below to 

provide a better background for the reader. This joint optimization algorithm is generally 
30 referred to as the "dual update technique" herein. 
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(5)(a) The Dual Update Technique 

Laplacianity of source signals in the sparse domain is assumed to follow the prior 
probabilityi > (^S;) = -e"^ T|ns)i , where c T =[1,1,-1] is a unit vector. By applying the 

5 "Laplacianity" of signals, "Gaussianity" of noise and no prior information on A, it can be 
shown that: 

l(^3;|A,^^ where R^ (v) is 

the noise covariance matrix. For the mathematical simplicity , it is assumed that the 
noise covariance matrix is an identity matrix. However, the proposed "dual update" 
J J 10 approach works for non-Gaussian noise with covariance greater than unity. With unit 

0 covariance assumption and re-writing the above equation in terms of t = 1,2, • • Tthe result 

St is: 

1 L(W(S)\W(X\A)= £(JP(x,) - A^(s,)) 2 + te T W(s t ) , (1) 

si y t=i 

iu s where x, and s, are the column vectors of X and S. For the estimation of the 

1 5 unknown mixing matrix A, the above equation is differentiated with respect to A and set 

dL(w(S)\W(X) a) t 
equal to zero. By doing this, v W| v h } = 2V(W(x t ) - AW($ t )) T W(s[) = 0 is 

8A ~f 

obtained. By using the individual components of W(S), A and by replacing the 
summation with the expectation operation, this equation can be written as 

Ety{x t ) T W{$\)}=E\ ^a T jW(sj) T W(s))\. E fr&fW (*{)}=() for i * ./based on the 
10=1 J J 

20 assumption that the source signals are statistically independent in the sparse domain. 
Using this result, the equation immediately above can be written as: 
2 xs = k > a 2 >' * • a M JS S where S s is the cov(fF(S)) = E§V(S) T W(S)} and 
S xs = E§¥(X) T W(S)} = [cr XSi , (j XS2 • * • <t Xsn ] . The estimated A matrix becomes: 

A = [a 1 ,a 2 -,a M ]=E xs Zs 1 or AZ S =S XS . (2) 



si: 

0% 
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Equation (2) can be viewed as a constraint on the values of £, such that if A is chosen 
according to equation (2), then the first term in equation (1) will be minimized with 
respect to A. Equation (2) also provides a means for estimating A 9 given an estimate of S, 
which is used to refine the estimate of A. 

5 

There is no closed form solution to solve this set of equations. However, they can be 
solved by applying the Linear Equality Constraints (LEC) optimization technique. A 
comprehensive treatment of the LEC is provided in P. E. Gill, W. Murray and M. H. 
Wright, Practical optimization. Chapter 3, Academic Press, 1981. LEC techniques are 
10 well known in the literature, and generally correspond to the following general 
framework: 

W ; 

f% minimize F(x) 

^ subject to Bx = c 

IfU where B is an m x n matrix and its i* row is denoted by bf that contains the coefficients of the i 01 linear 

contraint : bfx = a ix x x + a i2 x 2 + • • ■ + a in x n = c r 

;s ; 

jj^ 15 The LEC corresponds to finding x under the linearity constraint such that some function 
; 2f of x F(x) is minimized. However, to apply the constraint, it is necessary to know a B that 

!M corresponds to finding lines or directions of lines. It has been shown that the necessary 

condition for F(x*) to be a local minimum of LEC for some feasible x*, its projected 
gradient vector at x* should be equal to zero. Therefore, the LEC optimization problem 
20 can be solved by applying the line search together with the projection gradient method. 
An example of a method for finding the lines or direction of the lines is by applying 
Armijo rules of line search. This technique is preferred for use with the present 
invention. 



25 By applying the above described LEC, Equation (1) iteratively by using the following 
two steps: 
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Find W(S) that minimizes Xc T W(S) , which corresponds to minimizing 



A A 

F(W(S) , under the linear constraint A2 S = 2^ from equation (2) for a given 
choice of A\ and 

Use W(S) and estimate A = 2 xs Sg , which corresponds to finding lines or 



5 



directions of lines. 



The goal of this general approach is to determine feasible directions to search in, namely 
a direction that meets linear equality constraints. Once feasible directions have been 
found, an update step along the feasible directions is calculated, with the effect of 
decreasing the value of the function being minimized. These two steps are iterated until a 
10 convergence criterion is met. Specifically, from the above set of equations, it can be seen 

that a good initial estimate of A is needed to get a good estimate of 5, i.e. the S that is 
used in the step 1 above. As mentioned above it is of interest to find directions of lines 
(angles of lines). Therefore, the A matrix is considered as a function of as a function 




15 



of*. 



In summary, the joint optimization technique works as follows: 



1 . From the knowledge of the estimated mixing matrix A , the number of signal 
sources and their lines or directions are known; 



20 



2. Clusters are developed along the lines of correlation with one cluster for each 
source signal. Every spectrogram value in the selected frequency band is 
assigned to a cluster (in this case, the clusters are two-dimensional plots of xi and 
X2). As an initial guess, the spectrogram values of xi and X2 are assigned to a 
cluster on the basis of a straight distance matrix (other techniques known in the art 

could also be substituted). The clusters represent estimates of W(S) ; 



25 



3. Using the current estimate of W(S) , the function Ac r W (S) is evaluated to 



determine if a convergence has been reached. If the convergence criterion is not 
met, then step 2 is performed again to adjust the assignment of the spectrogram 
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values of xi and xi to create a new estimate of W(S) . Creating clusters of the 

spectrogram values along the lines of correlation specified by the mixing matrix A 
ensures that the constraint imposed by equation 2 (above) is met, 

(5)(b) Time-Courses Restoration of the Source Signals 

The next step in the proposed over-complete blind source separation is the restoration of 
separated source signals. For this, the final estimate of A obtained from the technique 
described in the last section is used, and the observed signals are transformed into 
wavelet sparse domain. Note that throughout this description that, although the Fourier 
and wavelet domains are preferred, any other transform into a sparse domain may be 
used. The wavelet sparse domain is used for the reconstruction of source signals because 
in the case of the spectrogram, the phase information is lost and the restored separated 
source signals will not be accurate. The question arises, why not apply the proposed "dual 
update" technique in the wavelet sparse domain and estimate both A and restore the 
separated source signals? This could be done; however, the proposed dual update 
technique will estimate A up to permutations which require finding the proper order for 
the source signal separation in each wavelet sub-band. In order to overcome this problem 
the "dual update" technique is applied twice - once in the sparse Fourier domain and 
once in the sparse wavelet domain. Only the final estimate of A matrix that resulted after 
the "dual update" technique satisfies the stopping criterion in the Fourier domain is kept. 
This matrix is fixed while restoring the source signals in the wavelet domain and 
minimizing the log likelihood function 

mm(l(^(S)|A,r(X))) 

where W(S) is the transformed version of S in the wavelet domain and W(X) is the 
transformed version of X in the wavelet domain. As described above, equation (2) is 
used as the linear constraint on W{ X ) . The minimization problem is solved using the 
same techniques that were used to solve the minimization problem in the Fourier domain. 
The only difference in this case is that the estimate of A is not changed, only S is updated. 
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The iterative procedure is stopped when the criterion is met. For this minimization the 
Armijo line search rule is preferred. 

In summary, the steps followed thus far are: 

1 . Create an initial estimate of the mixing matrix A using the Fourier domain with 
spectrogram coefficients in the frequency band that is determined to provide the 
best separation information, noting that the technique used to create the initial 
estimate of A automatically determines the number of sources (N); 

2. Create a final estimate of the mixing matrix A using the dual update technique in 
the Fourier domain; and 

3. Create an estimate of the source signal matrix S using the dual update algorithm 
applied in the wavelet domain to generate the reconstructed signals. 

In the estimation theory it is well known that a good estimator is the one whose 
estimation error variance is close to the theoretical Cramer-Rao bound (CRB). The 
performance of the present invention may be evaluated by comparing it with the CRB. 
For this comparison it is necessary to provide an apparatus for generating the CRB. In 
the next two sections, the derivation of the CRB for A and S are provided, respectively. 

(6) Developing the Theoretical Crame-Rao (CR) Bound for the Initial Estimated Mixing 
Matrix A 

In general A = [a, a 2 • • • a N ] where a. is a column vector of size M. As mentioned 
before, the angle of directions are of interest here. In other words the angle of directions 
of each columns of A must be determined. For mathematical simplicity consider a two 
sensors case. This case has been considered throughout this description for ease of 
visualization, but in general, the number of sensors with which the present invention is 
useful is unlimited. In this case, 
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A = [a x a 2 ] where a. = 



a 



2i 



with/ = 1,2,-- -M and0.=arctan 



viniiy 



. Thus, 



11(3) = 



cos(^)' 
sin(^) 



. Since directions are of primary interest, the matrix A is considered in 



the form A = A{0) = u(0 t ) . Now, consider the derivative of A with respect to 0 { . 



^-A(0) = A u( 0 () = A 
BO, d0, ' 80, 

Next consider that 



cos(0 t ) 
[sin(^)J 



-sm(0,) 
cos^) 



0 -lTcos^.) 

1 oJ[sin(^)J 



= pu(^) where p = 



0 -1 

1 0 



^(W{X) - MfiWW) = -^[ W W ~ Z^W.) J = -pu(0,)s f , and that 

A[( r(X ) - A(0)W(S)) T R-; (V) (FT(X) - A(0)W(S))]=-\pu(0 i )J (W(X) - A(0)JV(S))fV( Si ). 
Thus, 

80? 

-!^p T R-J iV) (W(X)-A(0)W(S)yV^^ (3) 



\{W(X) - A(0)W(S)) T RJ? (V) {W(X) - A(0)W(S))} = 



Now for the CRB derivation, consider the log likelihood function 

L{w(s)\w(x)M0))=T(nx,)-Mm&)Y 

t=\ t=i 
that is used for the estimation of A and S. The second order differentiation of this 

likelihood function with respect to 0 is as follows: 



since the 



8 2 l(w(S)\W(X)A(0)) ^Z(^W-WWfBAv,N-A(TO) 

60? = 80? 

second term does not depend on 0 . Substituting from equation (3), the above equation 
can be shown to be: 
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0 



8 2 l(w(S)\W(X)A(0)) _ 
d~0f " 



2feu r (^)p r p r R^v)(^-A(^)+u r (^)p r R^ (v) pu(^0sf}. 



information matrix. The Fisher information matrix is I = 



where £ is the 



ni 10 E 



It is well known in estimation theory that the CR bound is the inverse of a Fisher 

f d 2 L 

[80,80, 

expectation operator, L is the log likelihood function, and #is the parameter vector that is 
being estimated. By applying the expectation operation on the above second-order 
derivative, 

a^ ( sr(x)A OT )| _ £ ^ W)prR ., vipu(Ws , } j , s obtained B can te 



•2 2 '2 

shown that £{s^ } = — . Substituting for E{s l t }, 

K 

^^ (S g (X)AW) } = .-(^)p^,,^,)f. Since ^^J-C-l 

other terms of the Fisher information matrix except the diagonal elements can be shown 
to be zero. Therefore, the CR bound for the estimation of 0. is given by 

1 5 E^jp; - 0. J }> ^ jlhi 7777 . Assuming the following structure for the noise 



2Tn I (0 i )p i R^ y) pn(0 i ) 



covariance: R^ (V) = 



°V(V) P°V(V) 

2 2 

P°V(V) °V(V) 



, the CR bound can be expressed as a function of 



2 



X 

mixture signal-to-noise ratio (SNR) = 2 * = -y-^ — . Using this result, the CR bound 



2 jr,T Z 



for ft is: 
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(4) 



5 From Equation 4, it can be seen that the CRB for the estimate of A matrix depends on the 
length of data T that is used in its estimation. In FIG. 8, the theoretical bound is plotted 
with respect to the data length (e.g., matrix size). More specifically, FIG. 8 is a graph 
illustrating the application of the CR bound apparatus of the present invention applied to 

s an example estimated mixing matrix A , depicting the experimental bound when the 

Q 10 number of signal samples is 1600; 



j; (7) Developing the Theoretical Crame-Rao (CR) Bound for the Source Signal Estimate 

Matrix S 

15 This derivation is similar to that developed in the last section. Instead of differentiating 

'lad'* 

9 the log likelihood function with respect to 0 { , here it is differentiated with respect to S. 

Ik* This second order differentiation is given by: 

a2l( ^ ( 3 s ^ 

- Asign(W(S))(W(X) - A(0)S) T R^ v) A(0) (5) 

- ;LA r (0)R-„! (V) (^(X) - A(0)S)sign(W(S T ) 
+ A 2 sign(W(S))sign(W(S)) T 

Applying an expectation operation to the above equation and assuming that S is 
20 statistically independent of noise and that the noise has zero mean, equation 5 may be 
written as: 
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E- 



d l{w(S)\w(X)a(6)) ]^ = E ^T iW 2 m (w(to-MW)(!r(n - A(*)sf < v) a(<?)} 



■ E{z 2 sign(W(S))sign(W(S)) T \ 



- 1 with probability = 0.5 



The first term in the above equation can be written as : 
e\a t (0)R-' (V) (W(X) - A(0)SX^(X) - A(0)S) r 

= A T (0)R^ {V) E^W(X) - A(0)SX^(X) - A(0)Sf }r^ V) A(0) = A T (0)R^ V) A(0) 
since E^WiX) - A(0)S)(W(X) - A(0)S) r }= R W(vy 

The second term can be written as : E{x 2 sign(W(S))sign(W(S)) T }= A 2 E^ign(W(S))sign(W(S)) T }= A 2 ] 

[ 1 with probability = 0.5 

since sign(W(S)) is a binary random varible sign(W(S)) = ■ 
Substituting for the first and the second term, 

As in the last section all the elements in the Fisher's information matrix except the diagonal 
elements are zero. Substituting the above equation in the diagonal elements of Fisher's information 
matrix, using the same structure for the noise covariance matrix as in the last section and taking the 
inverse of the Fisher's information matrix, the CR bound for the estimate of W(S) is : 



4^(S)-^(S)) 2 }>| 



1 o v * 
f A T (0)A(9) + A 2 l 



(6) 



From the above equation it is apparent that the accuracy of the estimate of the separated 
source signals depends on the angle of mixing matrix. In the FIG. 8, the CRB theoretical 
curves are plotted for different A matrices, where can be seen that when the matrix is 
2x2, the bound increases with SNR; where as for other mixing matrix sizes the bound 
reaches a plateau. This indicates that in the case of regular blind source separation (i.e., 
where the number of sources is equal to number of sensors), theoretically it is possible to 
separate the source signals perfectly. However this is not the case for over-complete 
blind source separation (i.e., where the number of sensors is less than the number of 
sources). 
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An operational example of the present invention is described below using speech signals 
and noise signals as source signals, and two sensors (microphones). This example is 
provided to illustrate the use of the present invention in an example context. The 
example used describes the use of the present invention in a relatively simple case, and is 
not intended to restrict the scope of the invention in any manner. 

(8) An Operational Example 
Two contexts were selected for this example. In the first, three speech signals from three 
female speakers were used. A noise signal was also included, corresponding to babble 
noise. The noise signal was considered as the noise source V. The noise source was 
randomly split into two parts to provide two noise source signals. The three speech 
signals and two noise signals were mixed by generating a 2x3 A matrix randomly. For 
the generation of this mixing matrix, first random numbers were generated using a 
uniform random number generator, these random numbers were then used in calculating 
the angles as mentioned above and these angles were next used in forming a A matrix. 
This mixing matrix was then used to mix three speech signals and two noise signals. As a 
result of this mixing two mixtures were developed corresponding to received signals at 
two sensors (microphones). From the two mixed signals, three speech signals were 
separated using the present invention. As mentioned above, this technique first estimates 
the A matrix in the Fourier domain by applying the dual update technique. The details of 
how the mixed signals were transformed to Fourier sparse domain and how the A matrix 
was estimated are provided above. To check how close the estimate of A matrix is to the 
theoretical CRB, the theoretical and the experimental bounds are plotted in FIG. 9. More 
specifically, FIG. 9 depicts a graph of the application of the Crame-Rao (CR) bound 
apparatus of the present invention to develop CR bound curves of estimated source 

signals for different sizes of an estimated mixing matrix A . From this figure, it can be 
seen that the experimental (bound plotted at one SNR and one T) in the figure is close to 
the theoretical bound. 
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This estimated A matrix is then used to estimate the source signals. The source signals 
are estimated in the wavelet domain as mentioned above. For the application of the 
wavelet transform, a wavelet packet approach was used. The wavelet packet approach 
generates a library of bases for a given orthogonal wavelet function. Each of these bases 
5 offers a particular way of decomposing or representing (expanding) a signal. The most 
suitable decomposition of a given signal is then selected by applying the minimum 
entropy constraint. For the wavelet packet decomposition, a bi-orthogonal wavelet of 
order 6 was used. Both the spectrogram (sparse Fourier domain transformation) and the 
wavelet transforms are well known in the signal processing community. The dual update 
10 technique was applied in the wavelet sparse domain to estimate the source signals as 
m described previously. The estimated source signals in the wavelet domain were 

)3J transformed to the time domain by applying the inverse wavelet transform. The results 

O are provided in FIG. 10, where FIG. 10 is a series of graphs of amplitude versus time 
y depicting the separation of signals from a mixture containing three speech signals mixed 

ill !j 

jj^ 15 with two noise signals, with the aggregate mixed signal received at two sensors across an 
environment, where: 

H 

} f FIG. 10(a) depicts the original mixed signal including the three English speech signals 

Q mixed with the two noise signals; 

•i - 

20 FIG. 10(b)(i) depicts a first original (unmixed) English speech signal; 

FIG. 10(b)(ii) depicts the first English speech signal after separation, for comparison with 
the original depicted in FIG. 10(b)(i); 

FIG. 10(c)(i) depicts a second original (unmixed) English speech signal; 
FIG. 10(c)(ii) depicts the second English speech signal after separation, for comparison 
25 with the original depicted in FIG. 10(c)(i); 

FIG. 10(d)(i) depicts a third original (unmixed) English speech signal; and 

FIG. 10(d)(ii) depicts the third English speech signal after separation, for comparison 

with the original depicted in FIG. 10(d)(i). 

30 From the results shown in FIG. 10, it can be seen that the three speech signals are well 

separated from the mixed signal. In order to quantify how well the signals were 
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jn b 

J, : 

in p 



IU 15 



;! - 



separated, (a) SNR of mixed signals was computed as: 
f varfs ) ^ 

SNR S . = 101og 10 — 7 , ° r i = 1,2,3 & = 1,2 , and (b) the SNR of estimated 



l mix, 
k 



varima. 



source signal was computed as: 



fflVRg = 101og 10 



' var( S/ ) ^ 



i = 1,2,3 . By comparing these two SNRs, an average 



enhancement of 1 1 dB was obtained. 

Next, in the second context, three speech signals of different languages - Spanish, 
English and Ukrainian - were used. Spanish and English sentences were spoken by two 
10 children and the Ukrainian was spoken by an adult male. In this case Gaussian random 

if*. 4 ?; 

;J s f noise was used. A series of graphs of amplitude versus time depicting the separation of 
ly signals from a mixture containing two mixed signals with three speech signals mixed 

with a Gaussian random noise signal, with the aggregate mixed signal received at two 
sensors across an environment is depicted in FIG. 1 1, where: 



FIG. 1 1(a) depicts the original two mixed signals including the three mixed language 
speech signals mixed with the two noise signals; 

FIG. 1 l(b)(i) depicts a first original (unmixed) mixed language speech signal; 
FIG. 1 l(b)(ii) depicts the first mixed language speech signal after separation, for 
20 comparison with the original depicted in FIG. 1 1 (b)(i); 

FIG. 1 l(c)(i) depicts a second original (unmixed) mixed language speech signal; 
FIG. 1 l(c)(ii) depicts the second mixed language speech signal after separation, for 
comparison with the original depicted in FIG. 1 l(c)(i); 

FIG. 1 l(d)(i) depicts a third original (unmixed) mixed language speech signal; and 
25 FIG. 1 l(d)(ii) depicts the third mixed language speech signal after separation, for 
comparison with the original depicted in FIG. 1 l(d)(i). 
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From this figure, it can be seen that the three speech signals were well separated from the 
mixed signals. Once again, an average of 1 ldB SNR enhancement was obtained. 

From these two examples, it can be seen that this technique works well both for different 
types of noises and different speech signals (languages and speakers). Note that even 
though for mathematical simplicity in the derivations we assumed noise as Gaussian 
noise, the present invention performs well for non-Gaussian noise as indicated by the 
cases illustrated by FIG. 10. 
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